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Abstract. Simulation and bisimulation metrics for stochastic systems provide a quanti- 
tative generalization of the classical simulation and bisimulation relations. These metrics 
capture the similarity of states with respect to quantitative specifications written in the 
quantitative ^-calculus and related probabilistic logics. We first show that the metrics 
provide a bound for the difference in long-run average and discounted average behavior 
across states, indicating that the metrics can be used both in system verification, and in 
performance evaluation. For turn-based games and MDPs, we provide a polynomial-time 
algorithm for the computation of the one-step metric distance between states. The algo- 
rithm is based on linear programming; it improves on the previous known exponential-time 
algorithm based on a reduction to the theory of reals. We then present PSPACE algo- 
rithms for both the decision problem and the problem of approximating the metric distance 
between two states, matching the best known algorithms for Markov chains. For the bisim- 
ulation kernel of the metric our algorithm works in time 0(n 4 ) for both turn-based games 
and MDPs; improving the previously best known 0(n 9 -log(n)) time algorithm for MDPs. 

For a concurrent game G, we show that computing the exact distance between states is 
at least as hard as computing the value of concurrent reachability games and the square- 
root-sum problem in computational geometry. We show that checking whether the metric 
distance is bounded by a rational r, can be done via a reduction to the theory of real closed 
fields, involving a formula with three quantifier alternations, yielding OdGI " ^) time 
complexity, improving the previously known reduction, which yielded 0(|G|°" G ' ') time 
complexity. These algorithms can be iterated to approximate the metrics using binary 
search. 
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1. Introduction 

System metrics constitute a quantitative generalization of system relations. The bisim- 
ulation relation captures state equivalence: two states s and t are bisimilar if and only if 
they cannot be distinguished by any formula of the ^-calculus [5]. The bisimulation metric 
captures the degree of difference between two states: the bisimulation distance between s 
and t is a real number that provides a tight bound for the difference in value of formu- 
las of the quantitative /i-calculus at s and t [T2]. A similar connection holds between the 
simulation relation and the simulation metric. 

The classical system relations are a basic tool in the study of boolean properties of 
systems, that is, the properties that yield a truth value. As an example, if a state s of a 
transition system can reach a set of target states R, written s \= OR in temporal logic, 
and t can simulate s, then we can conclude t \= OR. System metrics play a similarly 
fundamental role in the study of the quantitative behavior of systems. As an example, 
if a state s of a Markov chain can reach a set of target states R with probability 0.8, 
written s \= P>o.sC>-R, arid if the metric simulation distance from t to s is 0.3, then we can 
conclude t \= P>o.50-R. The simulation relation is at the basis of the notions of system 
refinement and implementation, where qualitative properties are concerned. In analogous 
fashion, simulation metrics provide a notion of approximate refinement and implementation 
for quantitative properties. 

We consider three classes of systems: 

• Markov decision processes. In these systems there is one player. At each state, the 
player can choose a move; the current state and the move determine a probability 
distribution over the successor states. 

• Turn-based games. In these systems there are two players. At each state, only one 
of the two players can choose a move; the current state and the move determine a 
probability distribution over the successor states. 

• Concurrent games. In these systems there are two players. At each state, both 
players choose moves simultaneously and independently; the current state and the 
chosen moves determine a probability distribution over the successor states. 

System metrics were first studied for Markov chains and Markov decision processes (MDPs) 
[T2l [321 l33j [T3l [Ti] , and they have recently been extended to two-player turn-based and 
concurrent games [11]. The fundamental property of the metrics is that they provide a 
tight bound for the difference in value that formulas belonging to quantitative specification 
languages assume at the states of a system. More precisely, let q\x indicate the quantitative 
/i-calculus, a specification language in which many of the classical specification properties, 
including reachability and safety properties, can be written |10| . The metric bisimulation 
distance between two states s and t, denoted [s ~ 9 t], has the property that [s ~ 9 t] = 
su P<fiGqfj, I^C 5 ) ~~ where ip(s) and (p(t) are the values (p assumes at s and t. To each 

metric is associated a kernel: the kernel of a metric d is the relation that relates the pairs of 
states that have distance 0; to each metric corresponds a metric kernel relation. The kernel 
of the simulation metric is probabilistic simulation; the kernel of the bisimulation metric is 
probabilistic bisimulation \27\ . 

Metric as bound for discounted and long-run average payoff. Our first result is 
that the metrics developed in [11] provide a bound for the difference in long-run average 
and discounted average properties across states of a system. These average rewards play a 
central role in the theory of stochastic games, and in its applications to optimal control and 
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economics [HE]. Thus, the metrics of [11] are useful both for system verification, and for 
performance evaluation, supporting our belief that they constitute the canonical metrics for 
the study of the similarity of states in a game. We point out that it is possible to define 
a discounted version [~ g ] a of the game bisimulation metric; however, we show that this 
discounted metric does not provide a bound for the difference in discounted values. 
Algorithmic results. Next, we investigate algorithms for the computation of the metrics. 
The metrics can be computed in iterative fashion, following the inductive way in which they 
are defined. A metric d can be computed as the limit of a monotonically increasing sequence 
of approximations do, d\, cfo, • • • , where do(s,t) is the difference in value that variables can 
have at states s and t. For k > 0, dk+i is obtained from dk via dk+i = H(dk), where the 
operator H depends on the metric (bisimulation, or simulation), and on the type of system. 
Our main results are as follows: 

(1) Metrics for turn-based games and MDPs. We show that for turn-based games, and 
MDPs, the one-step metric operator H for both bisimulation and simulation can 
be computed in polynomial time, via a reduction to linear programming (LP). The 
only previously known algorithm, which can be inferred from [TT], had EXPTIME 
complexity and relied on a reduction to the theory of real closed fields; the algo- 
rithm thus had more a complexity-theoretic, than a practical, value. The key step 
in obtaining our polynomial-time algorithm consists in transforming the original 
sup-inf non-linear optimization problem (which required the theory of reals) into a 
quadratic-size inf linear optimization problem that can be solved via LP. We then 
present PSPACE algorithms for both the decision problem of the metric distance 
between two states and for the problem of computing the approximate metric dis- 
tance between two states for turn-based games and MDPs. Our algorithms match 
the complexity of the best known algorithms for the sub-class of Markov chains [31] . 

(2) Metrics for concurrent games. For concurrent games, our algorithms for the H 
operator still rely on decision procedures for the theory of real closed fields, leading 
to an EXPTIME procedure. However, the algorithms that could be inferred from 
[11] had time-complexity C?(|G|°^ G ' )), where |G| is the size of a game; we improve 
this result by presenting algorithms with OdGpd ! )) time-complexity. 

(3) Hardness of metric computation in concurrent games. We show that computing the 
exact distance of states of concurrent games is at least as hard as computing the 
value of concurrent reachability games [15^ [8] , which is known to be at least as hard 
as solving the square-root-sum problem in computational geometry [18]. These two 
problems are known to lie in PSPACE, and have resisted many attempts to show 
that they are in NP. 

(4) Kernel of the metrics. We present polynomial time algorithms to compute the sim- 
ulation and bisimulation kernel of the metrics for turn-based games and MDPs. Our 
algorithm for the bisimulation kernel of the metric runs in time C(n 4 ) (assuming a 
constant number of moves) as compared to the previous known 0(n 9 ■ log(n)) algo- 
rithm of [35] for MDPs, where n is the size of the state space. For concurrent games 
the simulation and the bisimulation kernel can be computed in time 0(|G|°^ G ' )), 
where |G| is the size of a game. 

Our formulation of probabilistic simulation and bisimulation differs from the one pre- 
viously considered for MDPs in [2]: there, the names of moves (called "labels") must be 
preserved by simulation and bisimulation, so that a move from a state has at most one 
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candidate simulator move at another state. Our problem for MDPs is closer to the one 
considered in [35], where labels must be preserved, but where a label can be associated with 
multiple probability distributions (moves). 

For turn-based games and MDPs, the algorithms for probabilistic simulation and bisim- 
ulation can be obtained from the LP algorithms that yield the metrics. For probabilistic 
simulation, the algorithm we obtain coincides with the algorithm previously published in 
[35]. The algorithm requires the solution of feasibility-LP problems with a number of 
variables and inequalities that is quadratic in the size of the system. For probabilistic 
bisimulation, we are able to improve on this result by providing an algorithm that requires 
the solution of feasibility-LP problems that have linearly many variables and constraints. 
Precisely, as for ordinary bisimulation, the kernel is computed via iterative refinement of 
a partition of the state space [23]. Given two states that belong to the same partition, to 
decide whether the states need to be split in the next partition-refinement step, we present 
an algorithm that requires the solution of a feasibility-LP problem with a number of vari- 
ables equal to the number of moves available at the states, and number of constraints linear 
in the number of equivalence classes. Overall, our algorithm for bisimulation runs in time 
C(n 4 ) (assuming a constant number of moves), considerably improving the 0(n 9 ■ log(n)) 
algorithm of [35] for MDPs, and providing for the first time a polynomial algorithm for 
turn-based games. 

2. Definitions 

Valuations. Let [#i,#2] C IR be a fixed, non-singleton real interval. Given a set of states 
S, a valuation over 5 is a function f : S [01,02] associating with every state s £ S a 
value 0\ < f{s) < 02', we let T be the set of all valuations. For c £ [0\, 02], we denote by c 
the constant valuation such that c(s) = c at all s € S. We order valuations pointwise: for 
f,gEJF, we write / < g iff f(s) < g(s) at all s € S; we remark that T, under <, forms 
a lattice. Given a,b £ IR, we write a U b = max{a, b}, and a n b = min{a, b}; we also let 
o©6 = min{l, max{0, a + b}} and aQb = max{0, min{l, a — b}}. We extend n, U, +, — , ©, 
to valuations by interpreting them in pointwise fashion. 

Game structures. For a finite set A, let Dist(^4) denote the set of probability distributions 
over A. We say that p € Dist(^4) is deterministic if there is a € A such that p(a) = 1. We 
assume a fixed finite set V of observation variables. 

A (two-player, concurrent) game structure G = (S, [■], Moves, T±, T2, 8) consists of the 
following components [HE]: 

• A finite set S of states. 

• A variable interpretation [•] : V 4 [0\,02] s , which associates with each variable 
v € V a valuation [v] . 

• A finite set Moves of moves. 

• Two move assignments Ti^: S 1-4 2 Moves \ {0}. For i G {1,2}, the assignment 
associates with each state s € S the nonempty set Ti(s) C Moves of moves available 
to player i at state s. 

• A probabilistic transition function 5: S x Moves x Moves 1— >• Dist(S'), that gives the 
probability 5(s,a\,a2)(t) of a transition from s to t when player 1 plays move a\ 
and player 2 plays move 02- 
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At every state s G S, player 1 chooses a move a\ G ri(s), and simultaneously and indepen- 
dently player 2 chooses a move a<i G ^(s). The game then proceeds to the successor state 
t G S with probability 8{s, ax, a2)(t). We let Dest(s, ax, 02) = {t G S \ 5(s,ax,a2)(t) > 0}. 
The propositional distance p{s, t) between two states s, t G S is the maximum difference in 
the valuation of any variable: 

p(s,t) = max|H(s) - [v](t)\ . 

The kernel of the propositional distance induces an equivalence on states: for states s, t, we 
let s = t if p(s,t) = 0. In the following, unless otherwise noted, the definitions refer to a 
game structure with components G = (S, [•], Moves, Fx,T2,S). We indicate the opponent of 
a player i G {1,2} by ~i = 3 — i. We consider the following subclasses of game structures. 
Turn-based game structures. A game structure G is turn-based if we can write S = 
Sx U S2 with Sx fl S2 = where s G Si implies (s) | = 1, and s G S2 implies | Ti (s) | = 1, 
and further, there exists a special variable turn G V, such that [turn]s = 9x iff s G Sx, and 
[turn]s = 62 iff s G 5*2. 

Markov decision processes. For i G {1,2}, we say that a structure is an i-MDP if 
Vs G S, |r^j(s)| = 1. For MDPs, we omit the (single) move of the player without a choice 
of moves, and write 5(s,a) for the transition function. 

Moves and strategies. A mixed move is a probability distribution over the moves available 
to a player at a state. We denote by T>i{s) C Dist {Moves) the set of mixed moves available 
to player i G {1, 2} at s G S, where: 

Vi(s) = {V G Dist {Moves) \ V{a) > implies a G T^s)} . 

The moves in Moves are called pure moves. We extend the transition function to mixed 
moves by defining, for s G S and xx G T>x{s), X2 G T>2{s), 

5{s,xx,x 2 ){t) = ^2 XT S{s,ax,a 2 ){t) ■ xx{ax) ■ x 2 {a 2 ) ■ 
aiGri(s) a 2 er 2 (s) 

A path a of G is an infinite sequence sq, sx, S2, ••• of states in s€S, such that for all k > 0, 
there exist moves a\ G Fi(sfc) and a\ G T2{sk) with <5(sfc, a l5 a^Xsfc+i) > 0. We write S for 
the set of all paths, and S s for the set of all paths starting from state s. 

A strategy for player i G {1,2} is a function 7Tj : S + i— > Dist {Moves) that associates with 
every non-empty finite sequence a G 5 + of states, representing the history of the game, a 
probability distribution iTi{o~), which is used to select the next move of player i; we require 
that for all a G S* and states s G S, if 7Tj(a"s)(a) > 0, then a G rj(s). We write IT for 
the set of strategies for player i. Once the starting state s and the strategies 7Ti and 712 for 
the two players have been chosen, the game is reduced to an ordinary stochastic process, 
denoted G^ 1 ' 772 , which defines a probability distribution on the set £ of paths. We denote by 
p r 7ri,-T2(.) probability of a measurable event (sets of paths) with respect to this process, 
and denote by Es 1,7r2 (-) the associated expectation operator. For k > 0, we let : £ — > S 
be the random variable denoting the k-th state along a path. 

One-step expectations and predecessor operators. Given a valuation / G T , a state 
s G S, and two mixed moves xi G £>i(s) and X2 G T>2 (s), we define the expectation of / 
from s under xi,X2 by, 
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For a game structure G, for i G {1,2} we define the valuation transformer Pre^ : T i-> T 
by, for all / 6 T and s G S as, 

Pre, (/)(*) = sup inf Ef . 

Intuitively, Prej(/)(s) is the maximal expectation player i can achieve of / after one step 
from s: this is the standard "one-day" or "next-stage" operator of the theory of repeated 
games [PT] . 



2.1. Quantitative //-calculus. We consider the set of properties expressed by the quanti- 
tative fi-calculus (qn)- As discussed in [20], [101 [22] , a large set of properties can be encoded 
in q/i, spanning from basic properties such as maximal reachability and safety probability, 
to the maximal probability of satisfying a general w-regular specification. 

Syntax. The syntax of quantitative /U-calculus is defined with respect to the set of obser- 
vation variables V as well as a set MVars of calculus variables, which are distinct from the 
observation variables in V. The syntax is given as follows: 

ip ::= c | v | V | -up \ ip\/ip\ip/\ip\ip@c\(pQc \ pre^y?) | pre 2 (93) | (J.V. <p \ vV. if 

for constants c G [#i,#2], observation variables v G V, and calculus variables V G MVars. 
In the formulas \iV. ip and vV. ip, we furthermore require that all occurrences of the bound 
variable V in (p occur in the scope of an even number of occurrences of the complement 
operator -1. A formula ip is closed if every calculus variable V in 99 occurs in the scope of 
a quantifier \iV or vV . From now on, with abuse of notation, we denote by q[i the set of 
closed formulas of qfi. A formula is a player i formula, for i G {1,2}, if does not contain 
the pre^j operator; we denote with q\x i the syntactic subset of qfi consisting only of closed 
player i formulas. A formula is in positive form if the negation appears only in front of 
constants and observation variables, i.e., in the context —ic and -if; we denote with qfj, + and 
qfif the subsets of q[i and q 1 /^ consisting only of positive formulas. 

We remark that the fixpoint operators /x and v will not be needed to achieve our 
results on the logical characterization of game relations. They have been included in the 
calculus because they allow the expression of many interesting properties, such as safety, 
reachability, and in general, w-regular properties. The operators © and 0, on the other 
hand, are necessary for our results. 

Semantics. A variable valuation £: MVars 1— > T is a function that maps every variable 
V G MVars to a valuation in T . We write £[V 1— > f] for the valuation that agrees with £ 
on all variables, except that V is mapped to /. Given a game structure G and a variable 
valuation £, every formula (p of the quantitative /i-calculus defines a valuation {ipj^ G T 
(the superscript G is omitted if the game structure is clear from the context): 

[oh = c H« = M 

[vh = S(V) bv% = 1 - Mt 

Iwe^k = Prei(fo>] c ) \{»}V. <p] ( = {™}{f G T \ f = M^ f] } 

where i G {1,2}. The existence of the fixpoints is guaranteed by the monotonicity and 
continuity of all operators and can be computed by Picard iteration [ID] . If ip is closed, \p>\ 
is independent of £, and we write simply [99]. 
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Discounted quantitative //-calculus. A discounted version of the //-calculus was intro- 
duced in [9]; we call this d\x. Let A be a finite set of discount parameters that take values 
in the interval [0, 1). The discounted //-calculus extends q/i by introducing discounted ver- 
sions of the player pre modalities. The syntax replaces pre^tp) for player i G {1,2} with 
its discounted variant, A • prej (ip), where A € A is a discount factor that discounts one-step 
valuations. Negation in the calculus is defined as -t(A ■ pre 1 (ip)) = (1 — A) + A • pre 2 (-"£>)■ 
This leads to two additional pre-modalities for the players, (1 — A) + A • pre^ip). 

2.2. Game bisimulation and simulation metrics. A directed metric is a function d : 
S 2 I—?- IR>o which satisfies d(s,s) = and the triangle inequality d(s,t) < d(s,u) + d(u,t) 
for all s, t, u G S. We denote by M. C S 2 h-> 1R the space of all directed metrics; this 
space, ordered pointwise, forms a lattice which we indicate with (M, <). Since d(s,t) may 
be zero for s ^ t, these functions are pseudo-metrics as per prevailing terminology [32] , In 
the following, we omit "directed" and simply say metric when the context is clear. For a 
metric d, we indicate with C(d) the set of valuations A; € J 7 where k(s) — k(t) < d(s,t) for 
every s,t G S. A metric transformer : A4 \— > M. is defined as follows, for all d G M 
and s,t € S: 

H^ 1 (d)(s,t)=p(s,t)U sup (Prei(fc)(s) - Prei (&) (*)) . (2.1) 

k£C{d) 

The player 1 game simulation metric [Xj] is the least fixpoint of H^ 1 ; the game bisimulation 
metric is the least symmetrical fixpoint of and is defined as follows, for all d G Ai 
and s,t G S: 

H^ 1 (d)(s,t)=H^(d)(s,t)UH^ 1 (d)(t,s) . (2.2) 
The operator is monotonic, non-decreasing and continuous in the lattice (M, <). We 
can therefore compute H^ 1 using Picard iteration; we denote by [X™] = i7-<«(0) the n-iterate 
of this. From the determinacy of concurrent games with respect to w-regular goals |21j . we 
have that the game bisimulation metric is reciprocal, in that = [—2]; we will thus simply 
write [—g\- Similarly, for all s,t G S we have [s X 1 t] = [t X 2 s]. 

The main result in |11] about these metrics is that they are logically characterized by the 
quantitative /t-calculus of |10| . We omit the formal definition of the syntax and semantics of 
the quantitative /i-calculus; we refer the reader to [TO] for details. Given a game structure 
G, every closed formula ip of the quantitative /t-calculus defines a valuation {(pj G T . 
Let q/j, (respectively, qfif) consist of all quantitative /i-calculus formulas (respectively, all 
quantitative /i-calculus formulas with only the Prei operator and all negations before atomic 
propositions). The result of [TTj shows that for all states s,t G S, 

[s Xi t] = sup (M(s) - M(t)) [s ^ g t] = sup \M(s) - M(t)\ . (2.3) 

Metrics for the discounted quantitative /i-calculus. We call d\i a the discounted 
/i-calculus with all discount parameters < a. We define the discounted metrics via an 
a-discounted metric transformer : A4 i-> M, defined for all d G M. and all s,t G S by: 

H$ (<£)(8,t) = p{s,t)Ua- sup (Prei(A;)(s) -Prei(fc)(t)) • (2.4) 

keC(d) 

Again, H" is continuous and monotonic in the lattice (M, <). The a-discounted simulation 
metric [Xi] Q is the least fixpoint of , and the a-discounted bisimulation metric [~i] a is 
the least symmetrical fixpoint of H% . The following result follows easily by induction on 
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the Picard iterations used to compute the distances [9]; for all states s,t £ S and a discount 
factor a E [0, 1), 

[a =<i t] a < [s £] [s ~! i] Q < [s ~i t] . (2.5) 

Using techniques similar to the undiscounted case, we can prove that for every game struc- 
ture G and discount factor a E [0,1), the fixpoint [rn] Q is a directed metric and [—i] a 
is a metric, and that they are reciprocal, i.e., [rh] a = [^2]° and = [— 2]°- Given 

the discounted bisimulation metric coincides for the two players, we write [~ g ] a instead of 
and [~ 2[ a . We now state without proof that the discounted /i-calculus provides a 
logical characterization of the discounted metric. The proof is based on induction on the 
structure of formulas, and closely follows the result for the undiscounted case [TT]. Let 
dfi a (respectively, dfi"' + ) consist of all discounted /u-calculus formulas (respectively, all dis- 
counted //-calculus formulas with only the Prei operator and all negations before atomic 
propositions). It follows that for all game structures G and states s,t 6 S, 

[s ±1 t] a = sup (Mis) - M(t)) [s ^ t] a = sup \M(s) - M(t)\ . (2.6) 

Metric kernels. The kernel of the metric [— g ] ([— g ] a ) defines an equivalence relation ~ 9 
(—g) on the states of a game structure: s ~ 5 t (s ~ 9 t) a iff [s ~ 9 t] = ([s ~ 9 t] a = 0); the 
relation ~ 5 is called the game bisimulation relation and the relation ~^ is called the 
discounted game bisimulation relation. Similarly, we define the game simulation preorder 
s ^1 t as the kernel of the directed metric [^1], that is, s ^1 t iff [s ^1 t] = 0. The 
discounted game simulation preorder is defined analogously. 

3. Bounds for Average and Discounted Payoff Games 

From (|2.3p it follows that the game bisimulation metric provides a tight bound for the 
difference in valuations of quantitative /z-calculus formulas. In this section, we show that the 
game bisimulation metric also provides a bound for the difference in average and discounted 
value of games. This lends further support for the game bisimulation metric, and its kernel, 
the game bisimulation relation, being the canonical game metrics and relations. 

3.1. Discounted payoff games. Let n\ and tx^ be strategies of player 1 and player 2 
respectively. Let a G [0,1) be a discount factor. The a- discounted payoff v f (s, m, 1^2) for 
player 1 at a state s for a variable r E V and the strategies it\ and 7T2 is defined as: 

00 

t;f(*,7ri,7r 2 ) = (1 - a) ■ J> n ■ E™([r](X n )), (3.1) 

n=0 

where X n is a random variable representing the state of the game in step n. The discounted 
payoff for player 2 is defined as v% (s,7ri,7T2) = — vf (s,7ri,7T2). Thus, player 1 wins (and 
player 2 loses) the "discounted sum" of the valuations of r along the path, where the 
discount factor weighs future rewards with the discount a. Given a state s E S, we are 
interested in finding the maximal payoff vf (s) that player i can ensure against all opponent 
strategies, when the game starts from state s £ S. This maximal payoff is given by: 

wf(s)= sup inf Vi(s,iTi,Tr^i) . 
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These values can be computed as the limit of the sequence of a-discounted, re-step rewards, 
for n — > oo. For i G {1,2}, we define a sequence of valuations wf(0)(s), wf (l)(s), wf(2)(s), 
. . . as follows: for all s G S and n > 0: 

<(n + l)(s) = (1 - a) ■ [r](s) + a • Prej«(n))(s) . (3.2) 

where the initial valuation itff(O) is arbitrary. Shapley proved that wf = lirrin^oo wf (re) 
[28]. 

3.2. Average payoff games. Let 7Ti and 7T2 be strategies of player 1 and player 2 respec- 
tively. The average payoff i>i(s, 7Ti, 7^) for player 1 at a state s for a variable r G V and the 
strategies tt\ and 1x2 is defined as 

. Th—l 

« 1 (a,7ri J ir 2 )=lim inf - VE? 1 ^([r](X fc )), (3.3) 

fc=0 

where Xk is a random variable representing the fc-th state of the game. The reward for 
player 2 is 1*2(5, 7Ti, 7T 2 ) = — vi(s, tti, 7T 2 ). A game structure G with average payoff is called 
an average reward game. The average value of the game G at s for player i G {1)2} is 
defined as 

Wi(s) = sup inf Vi(s,TTi,ir^i) . 

Mertens and Neyman established the determinacy of average reward games, and showed that 
the limit of the discounted value of a game as all the discount factors tend to 1 is the same 
as the average value of the game: for all s G S and i G {1, 2}, we have limc^i wf(s) = Wi(s) 
[23]. It is easy to show that the average value of a game is a valuation. 

3.3. Metrics for discounted and average payoffs. We show that the game simulation 
metric [^1] provides a bound for discounted and long-run rewards. The discounted metric 
[^i] a on the other hand does not provide such a bound as the following example shows. 




Figure 1: Example that shows that the discounted metric may not be an upper bound for 
the difference in the discounted value across states. 

Example 1. Consider a game consisting of four states s,t,s',t', and a variable r, with 
[r](s) = 2, [r](s') = 2.1, [r](t) = 5, and [r](t') = 8 as shown in Figure [TJ All players 
have only one move at each state, and the transition relation is deterministic. Consider 
a discount factor a = 0.9. The 0.9-discounted metric distance between states s 1 and s, is 
[s' ~ g s] ' 9 = 0.9- (8 — 5) = 2.7. For the difference in discounted values between the states we 
proceed as follows. Using formulation 13. 2\ taking w a (0)(t) = 5, since state t is absorbing, we 
get w a (l)(t) = (1 - 0.9) • 5 + 0.9 • 5 = 5 which leads to w a (n)(t) = 5 for all n > 0. Similarly 
w a (n)(t') = 8 for all n > 0. Therefore, the difference in discounted values between s and s', 
again usingE21 is given by: w a (s') - w a (s) = (1 - 0.9) • (2.1 - 2) + 0.9 • (8 - 5) = 2.71. ■ 
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In the following we consider player 1 rewards (the case for player 2 is identical). 

Theorem 1. The following assertions hold. 

(1) For all game structures G, a- discounted rewards wf, for all states s,t G S, we have, 
(a) wf{s) - wf(t) < [s t] and (b) \wf(s) - wf{t)\ < [s ~ 9 £]. 

(2) There exists a game structure G, states s,t G S, such that for all a-discounted 
rewards wf, wf(t) — wf(s) > [t ~ 5 s] a . 

Proof. We first prove assertion (l)(a). As the metric can be computed via Picard iteration, 
we have for all n > 0: 

[s ^f t] = p(s, t) U sup (Prei(fc)(s) - Prei (£;)(*)) . (3.4) 

We prove by induction on n > that wf(n)(s) — wf(n)(t) < [s t]. For all s G S, taking 
wf (0)(s) = [r](s), the base case follows. Assume the result holds for n — 1 > 0. We have: 

wf(n)(s) - wf(n)(t) = (1 - a) • [r](s) + a • Pr ei (w a (n - l))(s)- 

(1 - a) • [r](t) - a ■ Prei(-«; Q (n - l))(i) 

= (l-a)-([r]( a )-[r](t)) + 

a ■ (Prei(tf Q (n - l))(s) - Prei(w a (n - l))(t)) 

<{l-a)-p{s,t) + a-[s±it] < [s^t], 

where the last step follows by (|3,4p . since by the induction hypothesis we have wf(n — 1) G 
C([^™ -1 ]). This proves assertion (l)(a). Given (l)(a), from the definition of [s — g t] = 
[s t] U [t s], (l)(b) follows. 

The example shown in Figure [1] proves the second assertion. ■ 

Using the fact that the limit of the discounted reward, for a discount factor that ap- 
proaches 1, is equal to the average reward, we obtain that the metrics provide a bound for 
the difference in average values as well. 

Corollary 1. For all game structures G and states s and t, we have (a) w{s) — w{t) < 
[s t] and (b) \w{s) — w(t)\ < [s ~ 9 t\. 

3.4. Metrics for total rewards. The total reward vf(s, 7Ti, 112) for player 1 at a state s 
for a variable r G V and the strategies tt\ G IT and TT2 G II2 is defined as [17] : 

^ n— 1 k 

«f( S ,7ri,7T 2 ) = lim inf - £ £ E^ 2 ([r](X,-)) , (3.5) 

fc=0 J=0 

where Xj is a random variable representing the j-th state of the game. The payoff v^is, tt\ , ^2) 
for player 2 is defined by replacing [r] with — [r] in (13. 5p . The total-reward value of the game 
G at s for player i G {1,2} is defined analogously to the average value, via, 

wf(s)= sup inf vj(s, TTl, 7T 2 ) . 

While the game simulation metric provides an upper bound for the difference in dis- 
counted reward across states, as well as for the difference in average reward across states, 
it does not provide a bound for the difference in total reward. We now introduce a new 
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metric, the total reward metric, [cx ff ], which provides such a bound. For a discount factor 
a £ [0, 1), we define a metric transformer : M. i— > A4 as follows. For all d € and 
s,t & S, we let: 

f^ 1 (d)(s J «)=P(s»t) + «- sup (Prei(fc)(s)-Prei(fe)(t)) . (3.6) 

The metric (resp. [ixii]") is obtained as the least (resp. least symmetrical) fixpoint 

of (I3.6p . We write [<x] for [^h] 1 , and [mJ for [txii] . These metrics are reciprocal, i.e., 
[<!i] a = [E>2] Q an d [^i] a = [ [x| 2] a - If a < 1 we get the discounted total reward metric and 
if a = 1 we get the undiscownted total reward metric. While the discounted total reward 
metric is bounded, the undiscounted total reward metric may not be bounded. The total 
metrics provide bounds for the difference in discounted, average, and total reward between 
states. 

Theorem 2. The following assertions hold. 

(1) For all game structures G, for all discount factors a € [0, 1), for all states s,t G S, 
(a) [s < t t] a < (9 2 - 0i)/(l - a), (b) [s < x t] a < [s <h t], 

(c) wf(s) - wf(t) < [s <! t] a , (d) Wl (s) - Wl (t) < [s <! t], 

(e) wj{s)-wf(t) < [a<!t]. 

(%) There exists a game structure G and states s,t £ S such that, [s <i i] = 00. 

Proof. For assertion (l)(a), notice that i) < (#2 — #i)- Consider the n-step Picard iterate 
towards the metric distance. We have, 

n 

[s <? *P< J>M#2-#i). 

i=0 

In the limit this yields [s <i i] a < (#2 — $l)/(l — °0- Assertion (l)(b) follows by induction 
on the Picard iterations that realize the metric distance. For all n > 0, [s <" i] a < [s <™ i]. 
Assertion (l)(c) follows by the definition of the discounted total reward metric where we 
have replaced the U with a +. By induction, for all n > 0, from the proof of Theorem [1] we 
have, 

<(n)(s)-<(n)(i) < (l-a)-p(s, t)+a-(Prei«(n-l))(s)-Prei«(n-l))(t)) < [s<xt] a . 

For assertion (l)(d), towards an inductive argument on the Picard iterates that realize the 
metric, for all n > 0, we have [s ^™ t] < [s t], which in the limit gives [s ^1 t] < [s <i t]. 
This leads to w±(s) — W\(t) < [s <i t], using Corollary [0 This proves assertion (l)(d). We 
now prove assertion (l)(e) by induction and show that for all n > 0, u^(n)(s) — wj(n)(t) < 
[s <™ t\. As the metric can be computed via Picard iteration, we have for all n > 0: 

[s <« t] = p(s, t) + sup (Pre 1 (fc)(a) - Prei(fc)(f)) . (3.7) 

We define a valuation transformer u : T 1— >• J 7 as ■u(O) = [r] and for all n > and state 
s G 5 as, 

it(n)(s) = [r](s) + Prei (w(n — l))(s) 
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We take wj(0) = u(0) = [r] and for n > 0, from the definition of total rewards (I3.5h . we 
get the n-step total reward value at a state s € S in terms of u as, 



wf(rt)O) = - • «(»)(«) 



=i 



Notice that < u(n) for all n > 0. When n = 0, the result is immediate by the 

definition of wf(0), noticing that [s <5 t] = p( s >t)- Assume the result holds for n — 1 > 0. 
We have: 



wf(n)(a) - (n)(t) = - £ u(i)(a) ]T «(*)(*) 

i=i «=i 

1 n 



< 



n 

1 

n 

1 

77 



E((M(-)-[r](t))+ 
i=i 

(Prei(«(i - l))(s) - Prei(«(t - l))(i))) (3.8) 

n 

5><!t] (3.9) 

i=l 

<[s<?t], (3-10) 

where (|3.9|) follows from (|3.8|) by (|3.7|) . since by our induction hypothesis we have wj(i) < 
it(i) € C([<|]) for all < i < n and (|3.10j) follows from (|3.9[) from the monotonicity of the 
undiscounted total reward metric. To prove assertion (2), consider the game structure on the 
left hand side in Figure[TJ The total reward at state s is unbounded; wj(s) = 2+5+. . . = oo 
Now consider a modified version of the game, with identical structure and with states s' 
and t' corresponding to s and t of the original game. Let [r](t') = 0. In the modified game, 
wj(s') = 2. From result (l)(e), since wf(s) = oo and wf(s') = 2, we have [s <i s'] = oo. ■ 

It is a very simple observation that the quantitative /u-calculus does not provide a logical 
characterization for [<"] or In fact, all formulas of the quantitative /U-calculus have 

valuations in the interval [0i,#2]> while as stated in Theorem [21 the total reward can be 
unbounded. The difference is essentially due to the fact that our version of the quantitative 
//-calculus lacks a "+" operator. It is not clear how to introduce such a + operator in 
a context sufficiently restricted to provide a logical characterization for [<"]; above all, 
it is not clear whether a canonical calculus, with interesting formal properties, would be 
obtained. 

3.5. Metric kernels. We now show that the kernels of all the metrics defined in the paper 
coincide: an algorithm developed for the game kernels and ~ 5 , compute the kernels of 
the corresponding discounted and total reward metrics as well. 

Theorem 3. For all game structures G, states s and t, all discount factors a € [0, 1), the 
following statements are equivalent: 

(a) [s <i t] = (b) [s r<! t] a = (c) [s <! t] a = . 
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Proof. We prove (a) (b) (c) => (a). We assume < a < 1. Assertion (a) implies that 
p(s,i) = and sup feeC(bl]) (Prei (k) (s) - Prei(/c)(f)) < 0; Since C([^i] Q ) C C([^i]) from 
(|2.5p . (6) follows. We prove (6) =4> (c) by induction on the Picard iterations that compute 
[s t] a and [s <h t] a . The base case is immediate. Assume that for all states s and t, 
[s t] a = implies [s <™ - t] a = 0. Towards a contradiction, assume [s <i t] a = but 
[s <J i] Q > 0. Then there must be k G C([<i _1 ] a ) such that Prei (&)(«) - Prei(fc)(t) > 0. 
By our induction hypothesis, there exists a 5 > such that k' = 5 ■ k G C([^" _1 ] a ). Since 
Pre is multi-linear, the player optimal responses in Prei(fc)(s) remain optimal for k'. But 
this means (Prei(fc')(s) - Pre^A;')^)) > for k' G (^([r^i" 1 ] ), leading to [s < n t] a > 0; a 
contradiction. Therefore, (b) =>■ (c). In a similar fashion we can show that (c) => (a). ■ 



4. Algorithms for Turn-Based Games and MDPs 

In this section, we present algorithms for computing the metric and its kernel for turn- 
based games and MDPs. We first present a polynomial time algorithm to compute the 
operator H^^d) that gives the exact one-step distance between two states, for i G {1,2}. 
We then present a PSPACE algorithm to decide whether the limit distance between two 
states s and t (i.e., [s ^i t]) is at most a rational value r. Our algorithm matches the 
best known bound known for the special class of Markov chains [7TT ] . Finally, we present 
improved algorithms for the important case of the kernel of the metrics. Since by Theorem [3] 
the kernels of the metrics introduced in this paper coincide, we present our algorithms for the 
kernel of the undiscounted metric. For the bisimulation kernel our algorithm is significantly 
more efficient compared to previous algorithms. 

4.1. Algorithms for the metrics. For turn-based games and MDPs, only one player has 
a choice of moves at a given state. We consider two player 1 states. A similar analysis 
applies to player 2 states. We remark that the distance between states in Si and S^i is 
always 62 — Q\ due to the existence of the variable turn. For a metric d G M, and states 
s,t G Si, computing H-^ 1 (d)(s,t), given that p(s,t) is trivially computed by its definition, 
entails evaluating the expression, sup^g^^ (Prei(fc)(s) — Prei (&)(*)) , which is the same as, 
sup fc6 c(d)Sirp x . eI > l(s) mf yeVl(t) (W s (k) - E|f(fc)), since Prei(fc)(s) = sup x&Vl{s) (E x s (k)) and 
Prei = swpy eT)l n\(¥^(k)) as player 1 is the only player with a choice of moves at state 
s. By expanding the expectations, we get the following form, 




(4.1) 

We observe that the one-step distance as defined in (|4.ip is a sup-inf non-linear (quadratic) 
optimization problem. We now present two lemmas by which we transform (14. ip to an inf 
linear optimization problem, which we solve by linear programming (LP). The first lemma 
reduces (|4.ip to an equivalent formulation that considers only pure moves at state s. The 
second lemma further reduces (14. ip . using duality, to a formulation that can be solved using 
LP. 
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Lemma 1. For all turn-based game structures G, for all player i states s and t, given a 
metric d € A4, the following equality holds, 



sup sup inf (E x s {k) -E y t {k)) 

fceC(d) i£»i(s) y£T> t (t) 



sup inf sup (E"(£;) 
aeri(s) y£Vi{t) keC(d) 



Proof. We prove the result for player 1 states s and t, with the proof being identical for 
player 2. Given a metric d £ A4, we have, 



sup sup inf (E*(fc) - E\(k)) 

fceC(d) xeV^s) y&V^t) 



= sup ( sup E x s {k) 


— sup 




feeC(d) xex>i(s) 


yev x (t) 


= sup ( sup E"(fc) 


— sup 


E?(*)) 


feeC(d) aeri(s) 






= sup sup inf 


X(k) - 


E?(fc)) 


fceC(d) aeri(s)j/e»i(t) 






= sup sup inf 


'X(k) - 


ET(fc)) 


oeri(s) keC(d) ye-D^i) 






= sup inf sup 


'X(k) - 


ET(fc)) 


oeri(s) y£Vx(t) fceC(rf) 







(4.2) 

(4.3) 
(4.4) 



For a fixed k G C(d), since pure optimal strategies exist at each state for turn-based games 
and MDPs, we replace the sup^g-p^) with sup agri ( s ) yielding f)4.2[) . Since the difference in 
expectations is multi-linear, y € T> i(t) is a probability distribution and C(d) is a compact 
convex set, we can use the generalized minimax theorem |29| . and interchange the innermost 
sup inf to get (j4~4l) from (j4~3l) . ■ 
The proof of Lemma [T] is illustrated using the following example. 





(a) MDP 1 (b) MDP 2 

Figure 2: An example illustrating the proof of Lemma [TJ 



Example 4.1. Consider the example in Figure [2j In the MDPs shown in the figure, every 
move leads to a unique successor state, with the exception of move e £ Fi(s), which leads 
to states u and v with equal probability. Assume the variable valuations are such that 
all states are at a propositional distance of 1. Without loss of generality, assume that the 
valuation k E C{d) is such that k(u) > k{v). By the linearity of expectations, for move 
c € ri(s), E c s {k) > Ef (k) for all x G T>±(s). Similar arguments can be made for k(u) < k(v). 
This gives an informal justification for step (|4.2p in the proof; given a k G C(d), there exist 
pure optimal strategies for the single player with a choice of moves at each state. While we 
can use pure moves at states s and tifk€C(d) is known, the principle difficulty in directly 
computing the left hand side of the equality arises from the uncountably many values for 
k; the distance is the supremum over all possible values of k. In the final equality, step 
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(|4.4j) . and hence by this theorem, we have avoided this difficulty, by showing an equivalent 
expression that picks a k G C(d) to show the difference in distributions induced over states. 
As we shall see, this enables computing the one-step metric distance using a trans-shipping 
formulation. We remark that while we can use pure moves at state s, we cannot do so at 
state t in the right hand side of step (|4.4p of the proof. Firstly, the proof of the theorem 
depends on y G £>i(i) being convex. Secondly, if we could restrict our attention to pure 
moves at state t, then we can replace 'mi yeT)l u-s with mfj eri (t) on the right hand side. But 
this yields too fine a one-step distance. Consider move e at state s. We see that neither c 
nor b at state t yield distributions over states that match the distribution induced by e. We 
can then always pick k G C(d) such that Kg(k) —E,{ (k) > 0. If we choose y G T>\{t) such that 
yip) = yi c ) = 2' we ma tch the distribution induced by move e from state s, which implies 

that for any choice of k G C{d), Eg(fc) — E^ ^ = 0. Intuitively, the right hand side 

of the equality can be interpreted as a game between a protagonist and an antagonist, with 
the protagonist picking y G T>\{t), for every pure move a G Ii(s), to match the induced 
distributions over states. The antagonist then picks a k G C(d) to maximize the difference 
in induced distributions. If the distributions match, then no choice of k G C(d) yields a 
difference in expectations bounded away from 0. 

From Lemma [TJ given d G M, we can write the player 1 one-step distance between 
states s and t as follows, 

OneStep(s,M) = sup inf sup (E°(fc) - Ejf(fc)) . (4.5) 
aeTi(s) yeVi(t) fceC(d) 

Hence we compute for all a G ri(s), the expression, 

OneStep(s,t,d,a) = inf sup (E a s (k) - Ejf(jfe)), 
yeVi(t) keC(d) 

and then choose the maximum, i.e., max ae r 1 ( s ) OneStep(s, t, d, a). We now present a lemma 
that helps reduce the above inf — sup optimization problem to a linear program. We first 
introduce some notation. We denote by A the set of variables X UjV , for u,v G S. Given 
a G ri(s), and a distribution y G T>±(t), we write A G $>(s,t,a,y) if the following linear 
constraints are satisfied: 

(1) for all v G S : ^ A u ,„ = 5{s,a)(v); (2) for all u G S : ^ X u>v = ^ y(b) ■ 5{t, b)(u); 
nes v&s &eri(t) 

(3) for all u, v G S : \ u>v > . 

Lemma 2. For all turn-based game structures and MDPs G, for all d G M., and for all 
s,t G S, the following assertion holds: 

sup inf sup (E"(fe) — E^(fe)) = sup inf inf I N ci(n, v) ■ \ u> . 



a.&\ (s) 'yeDi (t) fcGC(d) aerj (s) j,ex>i (t) Ae*(s,t,o,j/) \ veS 

Proof. Since duality always holds in LP, from the LP duality based results of [32], for all 
a G ri(s) and y G T>i(t), the maximization over all k G C(d) can be re- written as a 
minimization problem as follows: 

sup (E«(fc) - Ej(fc)) = inf f V d(u,t;) • A n ,,,) . 
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The formula on the right hand side of the above equality is the trans-shipping formulation, 
which solves for the minimum cost of shipping the distribution 5(s, a) into 5{t, y), with edge 
costs d. The result of the lemma follows. ■ 
Using the above result we obtain the following LP for OneStep(s, t, d, a) over the vari- 
ables: (a) {X u ,v}u,ves, and (b) y b for b G Ti(t): 



Minimize 



u,v£S 



d(u, v) ■ X UjV subject to 



(4.6) 



(1) for all v G S : X UiV = 5(s,a)(v); (2) for all u £ S : X UjV = yt, ■ 5(t,b)(u); 

(3) for all u, v G S : X UjV > 0; (4) for all b G Ti(t) : y b > 0; (5) 



Vb 



1 . 



Example 4.2. We now use the MDPs in Figure 3(a)| and |3(b)| to compute the simulation 
distance between states using the results in Lemma [1] and Lemma [2j In the figure, states 
of the same color have a propositional distance of and states of different colors have a 
propositional distance of 1; p(s, s') = p(t, t') = p(u, u') = p(v, v') = p(t' , w') = 0. In MDP 1, 
shown in Figure |3(a)[ S(s,a)(t) = 5(t, b)(v) = 5(t,c)(u) = 1 and 5(t,f)(u) = S(t,f)(v) = 
\. In MDP 2, shown in Figure [3(g) 6(s',a)(w') = 5(s',b)(t') = 1, 5(t',c)(u') = \ - e, 
S(tf, c)(v') = i + e, 5(w', e){u') = 5{w\ /)(«') = 1 - e and 5(w' , e)(v') = S(w' , f)(v!) = e. 





(a) MDP 1 



(b) MDP 2 



Figure 3: An example used to compute the simulation metric between states. States of the 
same color have a propositional distance of 0. 



In Table O we show the simulation metric distance between states of the MDPs in 
Figure 3(a) and Figure |3(b)| Consider states t and t! . c is the only move available to 
player 1 from state t' and it induces a transition probability of ^ + e to state v 1 and ^ — e 
to state u' . For the pure move c at state t, the induced transition probabilities and edge 
costs in the trans-shipping formulation are shown in Figure 4(a) It is easy to see that the 
trans-shipping cost in this case is \ + e; shown in Table Q] along the row corresponding to 
move c from state t and column corresponding to state t! . Similarly, the trans-shipping 
cost for the moves b and / from state t are i — e and e respectively. The metric distance 
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t 

ri(t) 


w 1 

x G Vi(w') 


Cost 


x G Di(t') 


a 


Cost 


b 


x(f) = 1 


e 


x(c) = 1 






c 


x(e) = 1 


e 


x(c) = 1 




\ + e 


f 


x(f) = x{e) = \ 





x(c) = 1 




e 



Table 1: The moves from states w' and t' that minimize the trans-shipping cost for each 
a G Ti(t) and the corresponding costs. 



[=<] 


s' 


t! 


w' 


u' 


v> 


s 


€ 


l 


1 


1 


1 


t 


1 


\ + e 


e 


1 


1 


u 


1 


1 


1 





1 


V 


1 


1 


1 


1 






Table 2: The simulation metric distance between states in MDP 1 and states in MDP 2. 



1 








(a) [t * t'] = i + c (b) [s X s'] = e 



Figure 4: The trans-shipping formulation that gives the metric distances between states. 

[t ^ t'], which is the maximum over these trans-shipping costs is then ^ + £• Now consider 
the states t and w' . In Table[Tl we show for each pure move a € the move x G T>\{w') 

that minimizes the trans-shipping cost together with the minimum cost. In this case it is 
easy to see that [t ^ w'\ = e. Given [t ^ t'] = i + e and [t ^ w'\ = e, we can calculate the 
distance [s ^ s'] from the trans-shipping formulation shown in Figure [4(b)] the minimum 
cost is e that entails choosing move a from state s', giving us [s ^ s'] = e. 

Theorem 4.3. For all turn-based game structures and MDPs G, given d G M, for all states 
s,t G S, we can compute H^ 1 (d)(s,t) in polynomial time by the Linear Program J^.6p . 

For all states s,t G S, iteration of OneStep(s, i, d) converges to the exact distance. 
However, in general, there are no known bounds for the rate of convergence. We now present 
a decision procedure to check whether the exact distance between two states is at most a 
rational value r. We first show how to express the predicate d(s,t) = OneStep(s, t, d). We 
observe that since is non-decreasing, we have OneStep(s, t, d) > d(s,t). It follows that 
the equality d(s, t) = OneStep(s, t, d) holds iff for every a G Ti(s), of which there are finitely 
many, all the linear inequalities of LP (14.6P are satisfied, and d(s, t) = ^ n veS d(u, v) • X u>v 
holds. It then follows that d(s,t) = OneStep(s, t, d) can be written as a predicate in the 
theory of real closed fields. Given a rational r, two states s and t, we present an existential 
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theory of reals formula to decide whether [s t] < r. Since [s ^1 t] is the least fixed point 
of H^ 1 , we define a formula 3>(r) that is true iff, in the fixpoint, [s t) < r, as follows: 

3d £ M.[( f\ OneStep(u,v,d) = d(u,v)) A (d(s,t) < r )] • 

If the formula $(r) is true, then there exists a fixpoint d, such that d(s,t) is bounded by 
r, which implies that in the least fixpoint d(s,t) is bounded by r. Conversely, if in the 
least fixpoint d(s,t) is bounded by r, then the least fixpoint is a witness d for <3?(r) being 
true. Since the existential theory of reals is decidable in PSPACE [B], we have the following 
result. 

Theorem 4.4. (Decision complexity for exact distance). For all turn-based game structures 
and MDPs G, given a rational r, and two states s and t, whether [s t] < r can be decided 
in PSPACE. 

Approximation. Given a rational e > 0, using binary search and 0(log( 6>2 ~ 6>1 )) calls to 
check the formula <&(r), we can obtain an interval [I, u] with u — l<e such that [s i] lies 
in the interval [l,u]. 

Corollary 2. (Approximation for exact distance). For all turn-based game structures 
and MDPs G, given a rational e, and two states s and t, an interval [l,u] with u — I < e 
such that [s <\t] £ [l,u] can be computed in PSPACE. 

4.2. Algorithms for the kernel. The kernel of the simulation metric can be computed 
as the limit of the series if^, ^f, . . . , of relations. For all s,t £ S, we have (s,t) £^5 
iff s = t. For all n > 0, we have (s,t) £^™ +1 iff OneStep(s, t, l-<«) = 0. Checking the 
condition OneStep(s, i, = 0, corresponds to solving an LP feasibility problem for every 
a G ri(s), as it suffices to replace the minimization goal 7 = J2 U vgs l^™ ( n ' v ) ' ^u,v with the 
constraint 7 = in the LP (14. 6h . We note that this is the same LP feasibility problem that 
was introduced in |35] as part of an algorithm to decide simulation of probabilistic systems 
in which each label may lead to one or more distributions over states. 

For the bisimulation kernel, we present a more efficient algorithm, which also improves 
on the algorithms presented in [35]. The idea is to proceed by partition refinement, as usual 
for bisimulation computations. The refinement step is as follows: given a partition, two 
states s and t belong to the same refined partition iff every pure move from s induces a 
probability distribution on equivalence classes that can be matched by mixed moves from t, 
and vice versa. Precisely, we compute a sequence Q°, Q , Q 2 , . . . , of partitions. Two states 
s, t belong to the same class of Q° iff they have the same variable valuation (i.e., iff s = t). 
For n > 0, since by the definition of the bisimulation metric given in (|2.2j) . [s ~ 9 t] = iff 
[s ^1 t] = and [t ^1 s] = 0, two states s, t in a given class of Q n remain in the same class in 
Qn+i ^ both (s, t) and (t, s) satisfy the set of feasibility LP problems OneStepBis(s, t, Q n ) 
as given below: 

OneStepBis(s, t, Q) consists of one feasibility LP problem for each a G T(s). 
The problem for a 6 T(s) has set of variables {xb \ b £ r(t)}, and set of 
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constraints: 

(1) for all b G T(t) : x b >0, (2) ^ x b = 1, 

ber(t) 

(3) for all F G Q : ^ x b • $(t,6)(u) > ^ <S(s,a)(u) . 

bev(t) uev uev 

In the following theorem we show that two states s,t G S are n + 1 step bisimilar iff 
OneStepBis(s, i, Q n ) and OneStepBis(£, s, Q n ) are feasible. 

Theorem 4.5. For a// turn-based game structures and MDPs G, for all n > 0, given 
two states s,t G S and an n-step bisimulation partition of states Q n such that MV G Q n , 
Vu, v G V, [u ~ g v] n = 0, i/ie following holds, 

[s ~ 9 t] n+1 = iff OneStepBis(s,t, Q n ) and OneStepBis(t, s, Q n ) are both feasible. 

Proof. We proceed by induction on n. Assume the result holds for all iteration steps up to 
n and consider the case for n + 1. In one direction, if [s ~ 3 t] n+1 = 0, then [s -<\ t] n+1 = 
[t s] n+1 = by the definition of the bisimulation metric. We need to show that given 
[s ^ t] n+1 = 0, OneStepBis(s,t, Q n ) is feasible. The proof is identical for [t s] n+1 = 0. 
From the definition of the n + 1 step simulation distance, given t) = by our induction 
hypothesis, we have, 

V6 G r"i(s) inf sup (E^A;) - E?(fc)) < . (4.7) 
x&D x (t) fceC*(d") 

Consider a player 1 move a G Ti(s). Since we can interchange the order of the inf and sup 
by the generalized minimax theorem in mi x£Vl ^ sup fceC <( rf n)(E"(/c) — Ef(fc)), the optimal 
values of x G T^i(t) and k G C(d n ) exist and only depend on a. Let x a and /c a be the optimal 
values of x and k that realize the inf and sup in inf x£ d 1 ^ sup fcgC ( rf n) (£"(£;) — Kf(k)). Using 
x a and k a in (14. 7|) we have: 

E^(k a )>E a s (k a ) 
^28(t,x a )(u) ■ k a {u) > y^S(s,a)(v) ■ k a (v) 

J2 ^2S(t,x a )(u).k a (u)> J2 ^25(s,a)(v)-k a (v) (4.8) 

veQ n uev VeQ n v€V 

vv eQ".(^ <y(t, x )(«) > 5 ( s > . ( 4 - 10 ) 

where (|4.9p follows from (|4.8p by noting that for all V G Q n , for all states u, v G V, 
d n (u,v) = d n (v,u) = 0, by our hypothesis, leading to k(u) — k(v) < d n (u,v) = and 
k(v) — k(u) < d n (v,u) = 0, which implies k(u) = k(v) for all k G C(d n ). To show 
(|4.10p follows from (|4.9p . assume towards a contradiction that there exists a V G Q n 
such that XmeV" fi(ti x a)(u) < Yluev S(s,a)(u). Then there must be a V" G Q n such that 
X] u ev" <5(*> s )(«) > X^ugy" °)(' u ) since 5(t,x a ) is a probability distribution and the 
sum of the probability mass allocated to each equivalence class should be 1. Further, for 
all V G Q n , for all u,v € V, we have d n (u,v) = d n (v,u) = and for all u G V and for all 
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w G S \ V, we have d n (u,w) = d n (w,u) = 1. Therefore, we can pick a feasible k' £ C(d n ) 
such that > for all t> G V" and fc'(u) = for all other states. Using k' we get 

E"(fc') — E^ a (£/) > which means k a is not optimal, contradicting (|4.7p . 

In the other direction, assume that OneStepBis(s, t, Q n ) is feasible. We need to show 
that [s i] n+1 = 0. Since OneStepBis(s, t, Q n ) is feasible, there exists a distribution 
x a G £>i(t) for all a G Ti(s) such that, W G Q n .(£ uey X «X«) ^ E„ 6 V *)(*))• By 
our induction hypothesis, this implies that for all k G C(d n ), we have (E"(fc) — E^ a (/c)) < 
and in particular sup fc6C -( d n)(E"(A;) — K^ a (k)) < 0. Since p(s,t) = by our hypothesis and 
we have shown, 

Va G inf sup (E°(fc) - Ef (fc)) < 0, 

x6X>i(t) fceC*(d™) 

we have, from Lemma [IJ 

[s ^ t] n+1 = p( a , t) u sup inf sup (Ej(fc) - Ef(k)) = . 

aeri(s) a;€23i(t) feeC(d") 

In a similar fashion, if OneStepBis(t, s, Q n ) is feasible then [t s] n+1 = 0, which leads to 
[s ~ 9 = by the definition of the bisimulation metric, as required. ■ 
Complexity. The number of partition refinement steps required for the computation of 
both the simulation and the bisimulation kernel is bounded by 0(|5| 2 ) for turn-based games 
and MDPs, where S is the set of states. At every refinement step, at most 0(|5| 2 ) state 
pairs are considered, and for each state pair (s,t) at most |T(s)| LP feasibility problems 
needs to be solved. Let us denote by LPF(n,m) the complexity of solving the feasibility of 
m linear inequalities over n variables. We obtain the following result. 

Theorem 4.6. For all turn-based game structures and MDPs G, the following assertions 
hold: 

(1) the simulation kernel can be computed in 0(n A ■ m ■ LPF(n 2 + m, n 2 + 2n + m + 2)) time; 

(2) the bisimulation kernel can be computed in 0(n 4 • m ■ LPF(m, n + m + 1)) time; 
where n = \S\ is the size of the state space, and m = max sg s |r(s)|- 

Remark 1. The best known algorithm for LPF(n,m) works in time 0(n 2,5 ■ log(n)) [34] 
(assuming each arithmetic operation takes unit time). The previous algorithm for the 
bisimulation kernel checked two way simulation and hence has the complexity 0(n 4 ■ m ■ 
in 2 + m) 2,5 • log(n 2 + m)), whereas our algorithm works in time 0(n 4 ■ m ■ m 2 ' 5 • log(m)). For 
most practical purposes, the number of moves at a state is constant (i.e., m is constant). 
For the case when m is constant, the previous best known algorithm worked in C(n 9 dog(n)) 
time, whereas our algorithm works in time C(n 4 ). 

5. Algorithms for Concurrent Games 

In this section we first show that the computation of the metric distance is at least 
as hard as the computation of optimal values in concurrent reachability games. The exact 
complexity of the latter is open, but it is known to be at least as hard as the square-root 
sum problem, which is in PSPACE but whose inclusion in NP is a long-standing open 
problem [16} I18j. Next, we present algorithms based on a decision procedure for the theory 
of real closed fields, for both checking the bounds of the exact distance and the kernel of the 
metrics. Our reduction to the theory of real closed fields removes one quantifier alternation 
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when compared to the previous known formula (inferred from |llj). This improves the 
complexity of the algorithm. 

5.1. Reduction of reachability games to metrics. We will use the following terms in 
the result. A proposition is a boolean observation variable, and we say a state is labeled by 
a proposition q iff q is true at s. A state t is absorbing in a concurrent game, if both players 
have only one action available at t, and the next state of t is always t (it is a state with a 
self-loop). For a proposition q, let <>q denote the set of paths that visit a state labeled by 
q at least once. In concurrent reachability games, the objective is Oq, for a proposition q, 
and without loss of generality all states labeled by q are absorbing states. 

Theorem 4. Consider a concurrent game structure G, with a single proposition q, such 
that all states labeled by q are absorbing states. We can construct in linear-time a concurrent 
game structure G' , with one additional state t' , such that for all s G S, we have 

[s±it'] = sup inf PrJ ll7r2 (Og) . 

7rigIIl 7T2GII2 

Proof. The concurrent game structure G' is obtained from G by adding an absorbing 
state t' . The states that are not labeled by q, and the additional state t' , are labeled 
by its complement —*q. Observe there is only one proposition sequence from t' , and it 
is ( -i £/) w . To prove the desired claim we show that for all s G S we have [s t'] = 
sup^grjj inf jr2gri2 PrJ 1,7r2 (Og). From a state s in G the possible proposition sequences can 
be expressed as the following w-regular expression: ( _i <7) aJ U (->(/)* • q u ■ Since the proposition 
sequence from t' is ( _i Q , ) w , the supremum of the difference in values over q\x formulas at s 
and t! is obtained by satisfying the set of paths formalized as (^q)* ■ q u at s. The set of 
paths defined as (—>q)* ■ q u is the same as reaching q in any number of steps, since all states 
labeled by q are absorbing. Hence, 

sup (M(s) - M(t')) = lfiX.(q VPrei (*))](«) . 

It follows from the results of |10j that for all s G S we have, 

lfiX.(q VPreipO)](s) = sup inf Pr^ 2 (Oq) . 

7TielIl 7T2GII2 

From the above equalities and the logical characterization result (I2.3P we obtain the desired 
result. ■ 

5.2. Algorithms for the metrics. We first prove a lemma that helps to obtain reduced- 
complexity algorithms for concurrent games. The lemma states that the distance [s t] 
is attained by restricting player 2 to pure moves at state t, for all states s,t G S. 

Lemma 3. For all concurrent game structures G and all metrics d G AA, we have, 

sup sup inf sup inf (E x s ^ X2 (k)) - Ef > m (k)) 

keC(d) xi6X>i(s) yxeVi(t) j/ 2 G© 2 (i) x 2 &V 2 {s) 

= sup sup inf sup inf (E Xl ' X2 (k) -Ef' b (k)) . (5.1) 
fceC(d) xiev^s) j/iex>i(t) 6er 2 (t) x 2 eD 2 (s) 
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Proof. To prove our claim we fix k G C(d), and player 1 mixed moves x G T> i(s), and 
y G T>i(t). We then have, 

sup inf (E x s > X2 (k)) -E y ' V2 (k)) = inf E x s ' X2 (k) - inf Ef y2 {k) (5.2) 

y2€T) 2 {t) x 2 dV 2 {s) x 2 dV 2 {s) V2&V 2 (t) 

= inf Ef X2 (k) - inf E y t ' b {k) (5.3) 
X2 6X>2(a) ber 2 (t) 

= sup inf (Ej' Ba (jfe)-l#' 6 (jfe)), 

b&T 2 {t) x 2 &V 2 {s) 



where (|5.3p follows from (|5.2p since the decomposition on the rhs of (|5.2|) yields two inde- 
pendent linear optimization problems; the optimal values are attained at a vertex of the 
convex hulls of the distributions induced by pure player 2 moves at the two states. This 
easily leads to the result. ■ 
We now present algorithms for metrics in concurrent games. Due to the reduction 
from concurrent reachability games, shown in Theorem 01 it is unlikely that we have an 
algorithm in NP for the metric distance between states. We therefore construct statements 
in the theory of real closed fields, firstly to decide whether [s t] < r, for a rational r, so 
that we can approximate the metric distance between states s and t, and secondly to decide 
if [s t] =0 in order to compute the kernel of the game simulation and bisimulation 
metrics. 

The statements improve on the complexity that can be achieved by a direct translation 
of the statements of [11] to the theory of real closed fields. The complexity reduction is 
based on the observation that using Lemma O we can replace a sup operator with finite 
conjunction, and therefore reduce the quantifier complexity of the resulting formula. Fix 
a game structure G and states s and t of G. We proceed to construct a statement in the 
theory of reals that can be used to decide if [s t] < r, for a given rational r. 

In the following, we use variables x±, y\ and X2 to denote a set of variables {xi(a) \ a G 
ri(s)}, {yi(a) | a G Ti(t)} and {2:2(6) | b G ^(s)} respectively. We use k to denote the set 
of variables {k(u) \ u G 5}, and d for the set of variables {d(u,v) \ u, v G S}. The variables 
a, a' , /3, 13' range over reals. For convenience, we assume ^(i) = {61, . . . , h}. 

First, notice that we can write formulas that state that a variable x is a mixed move 
for a player at state s, and A; is a constructible predicate (i.e., k G C(d)): 

lsDist(a?,ri(s)) = f\ i(o)>0A f\ x(o)<lA ^ a?(o) = 1 , 
aeri(s) aeri(s) aer 1 (s) 

kBounded(M) = /\ k(u) > X Ak(u) < 6 2 A f\ (k(u) - k(v) < d(u, «)) . 



u,v£S 



In the following, we write bounded quantifiers of the form "3x\ G T>i(s)" or "VA: G C{d) v 
which mean respectively 3xi.lsDist(xi, Ti(s)) A • • • and V/c.kBounded(A;, d) —)>•••. 

Let r](k, xi, X2, yi, b) be the polynomial E x s 1,X2 (k) — E yi,b (k). Notice that r\ is a polyno- 
mial of degree 3. We write a = max{ai, . . . , a/} for variables a,a±,...,ai for the formula 

l l 
(a = a\ A /\ a\ > a.j) V . . . V (a = ai A /\ ai > a^) . 



i=l i=l 
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We construct the formula for game simulation in stages. First, we construct a formula 
$i(d, s, t, k, x, a) with free variables d, k, x, a such that &i(d, s, t, k, x\,a) holds for a valu- 
ation to the variables iff 

a= inf sup inf (EJ 1 '* 2 ^) - Ef '\k)) . 
yieOi(i) ber 2 (t) ^&v 2 {s) 

We use the following observation to move the innermost inf ahead of the sup over the finite 
set V 2 {t) (for a function /): 



sup inf f(b,X2,x) 



inf 



inf max(/(6i,X2 1 ,x), . . . , f(b h x b 2 ' , x)) 



The formula &i(d, s, t, k, x\, a) is given by: 

Vyi G V x {t)Nx^ £D 2 (s)... x 2 l G V 2 (s).Vwi . . . ^.Va.Va'. 

3y x G V l (t).3x b 2 1 G V 2 {s) . . . x 2 l G V 2 {s)3w 1 . . . w t 3a. 
wi = r](k,x 1 ,x b 2 \yi,b 1 ) 



A • • • A 

wi = r](k,x 1 ,x 2 l ,y 1 ,b l j^jA 
(a = max{wi, . . . , wi}) 



(a > a) 



A 



wi = f]{ki x\ , x 2 , yi ) &i) 
A • • • A 

m = r](k,x 1 ,x 2 l ,y 1 ,bi) J A 

(a = maxjwi, . . . , A a > ct'(s, t)) 

Using $1, we construct a formula <3?((i, s, t, a) with free variables d G and a G such 
that <!>(<i, s, i, a) is true iff: 

q= sup sup inf sup inf (Ef'^ik) -E y t ub (k)) . 
kec(d) x 1 ev 1 (s) y 1 ev 1 {t) ber 2 (t) x 2 ev 2 (s) 

The formula <3? is defined as follows: 
VA; G C(d).y Xl G Pi(s).V/3.Va. 

$i(d,s,t,k,xi,/3) ->■ (/3(s,t) < a)A 
(W G C(d).Vx' 1 G T> 1 (8).Vp'.$ 1 (d,s,t,kf,rf L ,P')AP'(8,t) < a') ^ a < a' 

Finally, given a rational r, we can check if [s t] < r by checking if the following sentence 
is true: 

3d G X.3a G M\{ A $(<i, u, u, a(n, u)) A u) = a(u, v))) A t) < r)] . (5.5) 

The above sentence is true iff in the least fixpoint, d(s, t) is bounded by r. Like in the case of 
turn-based games and MDPs, given a rational e > 0, using binary search and 0(\og{ d2 ~ ei )) 
calls to a decision procedure to check the sentence (15. 5p . we can compute an interval [l,u] 
with u — I < e, such that [s t] G u]. 

Complexity. Note that <1> is of the form V3V, because $i is of the form V3, and appears in 
negative position in <3?. The formula <& has (|5| + |Ti (s) | + 3) universally quantified variables, 
followed by (\S\ + |ri(s)| + 3 + 2(|ri(t)| + |r 2 (s)| • |r 2 (t)| + |r 2 (t)| + 2)) existentially quantified 



• (5-4) 
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variables, followed by 2(|ri(t)| + |r2(s)|-|r2(t)| + |r2(t)|+l) universal variables. The sentence 
(|5.5p introduces | ^S* | 2 + |S| 2 existentially quantified variables ahead of <3>. The matrix of the 
formula is of length at most quadratic in the size of the game, and the maximum degree 
of any polynomial in the formula is 3. We define the size of a game G as: \G\ = \S\ + T, 
where T = ^2 S teS ^2 abeMoves \5(s,a,b)(t)\. Using the complexity of deciding a formula in 
the theory of real closed fields j3], which states that a formula with i quantifier blocks, where 
each block has li variables, of p polynomials, has a time complexity bound of 0(j} C, ( n (^+ 1 ))), 
we get the following result. 

Theorem 5.1. (Decision complexity for exact distance). For all concurrent game structures 
G, given a rational r, and two states s and t, whether [s <] < r can be decided in time 
0(\G\°^). 

Approximation. Given a rational e > 0, using binary search and 0(log( e2 ~ gl )) calls to 
check the formula 15.51 we can obtain an interval [l,u] with u — I < e such that [s t] lies 
in the interval [l,u]. 

Corollary 3. (Approximation for exact distance). For all concurrent game structures 
G, given a rational e, and two states s and t, an interval [l,u] with u — I < e such that 
[s t] G [l,u] can be computed in time C(log(^^ L ) • \G\°^ G ^). 

In contrast, the formula to check whether [s ^i t] < r, for a rational r, as implied by 
the definition of H^ 1 (d)(s,t), that does not use Lemma El has five quantifier alternations 
due to the inner sup, which when combined with the 2- \S\ 2 existentially quantified variables 
in the sentence (|5.5p . yields a decision complexity of C(|G| C,( -' G! ' •*). 



Vu,v G S. d n (u,v) 



5.3. Computing the kernels. Similar to the case of turn-based games and MDPs, the 
kernel of the simulation metric for concurrent games can be computed as the limit of 
the series -<\, . . . , of relations. For all s,t G S, we have (s,t) iff s = t. For all 
n > 0, we have (s,t) G^™ +1 iff the following sentence <fr s is true: 

Va.$(cf\ s,t,a) -?■ a = 0, 

where $ is defined as in (|5.4p and at step n in the iteration, the distance between any pair 
of states u, v & S is defined as follows, 

' if (s, t) G r<? 
^ 1 if (s, t) £ 

To compute the bisimulation kernel, we again proceed by partition refinement. For a set of 
partitions Q°, Q 1 , . . ., where (s,t) G V for V G Q n implies (s,t) G~^, (s,t) G~ n+1 iff the 
following sentence is true for the state pairs (s,t) and (t, s): 

Va.$(d n , s,t,a) —?■ a = 0, 

where $ is again as defined in (|5.4p and at step n in the iteration, the distance between any 
pair of states u, v G S is defined as follows, 

' if (s, t) G ~? 
1 if (s, t) ~J 



Vu,v G S. d n (u,v) 
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Complexity. In the worst case we need OdS 1 ) 2 ) partition refinement steps for computing 
both the simulation and the bisimulation relation. At each partition refinement step the 
number of state pairs we consider is bounded by C(|5| 2 ). We can check if $ s and are true 
using a decision procedure for the theory of real closed fields. Therefore, we need C(|5| 4 ) 
decisions to compute the kernels. The partitioning of states based on the decisions can be 
done by any of the partition refinement algorithms, such as |25j . 

Theorem 5.2. For all concurrent game structures G, states s and t, whether s t can 
be decided in 0{\G\°^ G ^) time, and whether s ~ g t can be decided in 0{\G\ ^) time. 

6. Conclusion: Possible Applications and Open Problems 

We have shown theoretical applications of game metrics with respect to discounted and 
long-run average values of games. An interesting question regarding game metrics is related 
to their usefulness in real-world applications. We now discuss possible applications of game 
metrics. 

• State space reduction. The kernels of the metrics are the simulation and bisimula- 
tion relations. These relations have been well studied in the context of transition 
systems with applications in program analysis and verification. For example, in 
[19] the authors show that bisimulation based state space reduction is practical and 
may result in an enormous reduction in model size, speeding up model checking of 
probabilistic systems. 

• Security. Bisimulation plays a critical role in the formal analysis of security proto- 
cols. If two instances of a protocol, parameterized by a message to, are bisimilar 
for messages to and to', then the messages remain secret [7]. The authors use 
bisimulation in probabilistic transition systems to analyze probabilistic anonymity 
in security protocols. 

• Computational Biology. In the emerging area of computational systems biology, 
the authors of [30] use the metrics defined in the context of probabilistic systems 
[12 } 132 } [33] to compare reduced models of Stochastic Reaction Networks. These re- 
action networks are used to study intra-cellular behavior in computational systems 
biology. The reduced models are Continuous Time Markov Chains (CTMCs), and 
the comparison of different reduced models is via the metric distance between their 
initial states. A central question in the study of intra-cellular behavior is estimating 
the sizes of populations of various species that cohabitate cells. The inter-cellular 
dynamics in this context is modeled as a stochastic process, representing the tem- 
poral evolution of the species' populations, represented by a family (X(t))t>o of 
random vectors. For < i < N, N being the number of different species, Xi(t) is 
the population of species Si at time t. In [26], the authors show how CTMCs that 
model system dynamics can be reduced to Discrete Time Markov Chains (DTMCs) 
using a technique called uniformization or discrete-time conversion. The DTMCs 
are stochastically identical to the CTMCs and enable more efficient estimation of 
species' populations. An assumption that is made in these studies is that systems are 
spatially homogeneous and thermally equilibrated; the molecules are well stirred in 
a fixed volume at a constant temperature. These assumptions enable the reduction 
of these systems to CTMCs and to DTMCs in some cases. 
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In the applications we have discussed, non-determinism is modeled probabilistically. In 
applications where non-determinism needs to be interpreted demonically, rather than prob- 
abilistically, MDPs or turn-based games would be the appropriate framework for analysis. 
If the interaction between various sources of non-determinism needs to be modeled simulta- 
neously, then concurrent games would be the appropriate framework for analysis. For the 
analysis of these general models, our results and algorithms will be useful. 
Open Problems. While we have shown polynomial time algorithms for the kernel of the 
simulation and bisimulation metrics for MDPs and turn-based games, the existence of a 
polynomial time algorithm for the kernel of both the simulation and bisimulation metrics 
for concurrent games is an open problem. The existence of a polynomial time algorithm 
to approximate the exact metric distance in the case of turn-based games and MDPs is an 
open problem. The existence of a PSPACE algorithm for the decision problem of the exact 
metric distance in concurrent games is an open problem. 
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